Using random subspace method for prediction and variable importance assessment in linear regression
نویسندگان
چکیده
A randomsubsetmethod (RSM)with a newweighting scheme is proposed and investigated for linear regression with a large number of features. Weights of variables are defined as averages of squared values of pertaining t-statistics over fitted models with randomly chosen features. It is argued that such weighting is advisable as it incorporates two factors: a measure of importance of the variable within the considered model and a measure of goodness-of-fit of the model itself. Asymptotic weights assigned by such a scheme are determined as well as assumptions under which the method leads to consistent choice of significant variables in the model. Numerical experiments indicate that the proposed method behaves promisingly when its prediction errors are compared with errors of penalty-based methods such as the lasso and it has much smaller false discovery rate than the other methods considered. © 2012 Elsevier B.V. All rights reserved.
منابع مشابه
Extensions to Quantile Regression Forests for Very High-Dimensional Data
This paper describes new extensions to the state-of-the-art regression random forests Quantile Regression Forests (QRF) for applications to high dimensional data with thousands of features. We propose a new subspace sampling method that randomly samples a subset of features from two separate feature sets, one containing important features and the other one containing less important features. Th...
متن کاملComparison of artificial neural network with logistic regression in prediction of tendency to surgical intervention in nurses
Introduction: Logistic regression is one of the modeling methods for bipartite dependent variables. On the other hand, artificial neural network is a flexible method with the least limitation. The importance of growing unnecessary beauty surgeries and the importance of prediction and classification made us consider the present study, with the aim of comparing logistic regression and artificial ...
متن کاملApplication of Linear Regression and Artificial NeuralNetwork for Broiler Chicken Growth Performance Prediction
This study was conducted to investigate the prediction of growth performance using linear regression and artificial neural network (ANN) in broiler chicken. Artificial neural networks (ANNs) are powerful tools for modeling systems in a wide range of applications. The ANN model with a back propagation algorithm successfully learned the relationship between the inputs of metabolizable energy (kca...
متن کاملSupport vector regression with random output variable and probabilistic constraints
Support Vector Regression (SVR) solves regression problems based on the concept of Support Vector Machine (SVM). In this paper, a new model of SVR with probabilistic constraints is proposed that any of output data and bias are considered the random variables with uniform probability functions. Using the new proposed method, the optimal hyperplane regression can be obtained by solving a quadrati...
متن کاملVariable Importance Assessment in Regression: Linear Regression versus Random Forest
Relative importance of regressor variables is an old topic that still awaits a satisfactory solution. When interest is in attributing importance in linear regression, averaging over orderings methods for decomposing R2 are among the state-of-theart methods, although the mechanism behind their behavior is not (yet) completely understood. Random forests—a machinelearning tool for classification a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Computational Statistics & Data Analysis
دوره 71 شماره
صفحات -
تاریخ انتشار 2014